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We present a search for the standard model Higgs boson in final states with a charged lepton 
(electron or muon) , missing transverse energy, and two or three jets, at least one of which is identified 
as a fa-quark jet. The search is primarily sensitive to WH — > Ivbb production and uses data 
corresponding to 9.7 fb _1 of integrated luminosity collected with the DO detector at the Fermilab 
Tevatron pp Collider at y^s = 1.96 TeV. We observe agreement between the data and the expected 
background. For a Higgs boson mass of 125 GeV, we set a 95% C.L. upper limit on the production of 
a standard model Higgs boson of 5.2xctsm, where o~sm is the standard model Higgs boson production 
cross section, while the expected limit is 4.7xctsm- 

PACS numbers: 14.80.Bn, 13.85.Rm 



The Higgs boson is the only fundamental particle in the 
standard model (SM) predicted as a direct consequence of 
the Higgs mechanism describing spontaneous electroweak 
symmetry breaking [iJ-Q. 

The Higgs mechanism generates the masses of the weak 
gauge bosons and provides an explanation for the nonzero 
masses of fermions generated by their Yukawa couplings 
to the Higgs field. The mass of the Higgs boson (Mb) is 
a free parameter in the SM that must be constrained by 
experimental results. The direct searches at the CERN 
e+er Collider (LEP) Q exclude M H < 114.4 GeV at the 
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95% confidence level (C.L.) and precision measurements 
of other electroweak parameters constrain Mb to be less 
than 152 GeV [HH3- The region 147 < M H < 179 GeV 
is excluded by the combined analysis of the CDF and 
DO Collaborations 0. The ATLAS and CMS Collab- 
orations at the CERN Large Hadron Collider (LHC) 
have excluded much of the allowed mass range and re- 
ported excesses at the 2-3 standard deviation (s.d.) level 
for M H « 125 GeV H E3- The experiments now ex- 
clude 111 < Mb < 122 GeV, 129 < M H < 559 GeV 
(ATLAS) [llj, and 110 < M H < 122 GeV, 127 < 
M H < 600 GeV (CMS) [H. Both experiments have ob- 
served a resonance consistent with SM Higgs production 
at Mb « 125 GeV, primarily in the 77 and ZZ final 
above the 5 s.d. 



states, above the 5 s.d. level [11], [12j. Demonstrating 
that the observed resonance is due to SM Higgs boson 
production requires also observing it in the bb final state, 
which is the dominant decay mode in this mass range. 

The dominant Higgs boson production process at the 



4 



Tevatron Collider is gluon-gluon fusion. The associated 
production of a Higgs boson with a weak boson occurs 
at a rate about 3 times lower than the gluon-gluon fu- 
sion production process but is of particular importance in 
Higgs boson searches. At masses below Mh ~ 135 GeV, 
H — > bb decays dominate but are difficult to distinguish 
from background when the Higgs boson is produced by 
gluon-gluon fusion. Instead, associated production of a 
Higgs boson and a W boson is one of the most sensitive 
search channels at the Tevatron. 

This Letter presents a search based on events with one 
charged lepton (I = e or //), an imbalance in transverse 
energy ($t) that arises from the neutrino in the W —> tv 
decay, and two or three jets, where one or more of these 
jets is selected as a candidate b quark ("6-tagged") jet. 
The search is also sensitive to ZH production when one 
of the charged leptons from the Z — » £ + £~ decay is not 
identified. The analysis is optimized by subdividing into 
channels with different background compositions and sig- 
nal to background ratios based on lepton flavor, jet multi- 
plicity, and the number and quality of candidate 6-quark 
jets. 

Several searches for WH — > Ivbb production have al- 
ready been reported at a pp center-of-mass energy of 
y/s = 1.96 TeV, most recently by the CDF Collaboration 
ljEil by the DO Collaboration 



Previous searches 



use subsamples of the data presented in this Letter with 
integrated luminosities up to 5.3 fb -1 . We present an 
updated search using a multivariate approach with a full 
dataset which, after imposing data quality requirements, 
corresponds to an integrated luminosity of 9.7 fb . 

This analysis uses most of the major components of the 
DO detector, described in detail in Refs. |19h22| . Events 
in the electron channel are selected with triggers requir- 
ing an electromagnetic object in the calorimeter or an 
electromagnetic object with additional jets. In the muon 
channel we use a mixture of single muon, muon plus jet, 
Ifir plus jet, and multijet triggers. We correct simulated 
events for trigger efficiency by using a method similar to 
that described in Ref. [18| . 

Several SM processes produce or can mimic a final 
state with a charged lepton, Ifr, and jets, including di- 
boson (WW, WZ, and ZZ), F+jets (V — W or Z), 
tt, single top quark, and multijet (MJ) production. We 
estimate the MJ background from data and other back- 
grounds from simulation. The y+jets and tt samples 
are simulated with the ALPGEN I23j Monte Carlo (MC) 
generator interfaced to pythia [24| for parton shower- 
ing and hadronization. ALPGEN samples are produced 
by using the MLM parton-jet matching prescription 23 1. 
The F+jets samples contain V + jj (where j — u, d, s, 
or g) and V + cj (together denoted as "y+light-flavor" ) 
processes, and V + bb and V + cc (together denoted as 
"F-l- heavy- flavor" ) , generated separately from V^+light- 
flavor. pythia is used to simulate the production 
of dibosons (WW, WZ, and ZZ) and all signal pro- 



cesses. Single top quark events are generated with the 
SINGLETOP event generator [25], [26| using PYTHIA for 
parton evolution and hadronization. Simulation of back- 
ground and signal processes uses the CTEQ6L1 27, 2i[ 
leading-order (LO) parton distribution functions. Events 
are processed through a full DO detector simulation based 
on GEANT 29]. To account for multiple pp interactions, 
all generated events are overlaid with an event from a 
sample of random beam crossings with the same instan- 
taneous luminosity profile as the data. Further on, events 
are reconstructed by using the same software as is used 
for the data. 

The signal cross sections and branching fractions B 
are normalized to the SM predictions Next-to-LO 
(NLO) cross sections are used for single top quark [3(| 
and diboson [3ll |32| production and approximate next- 
to-NLO (NNLO) for tt production J^. The F+iets pro- 
cesses are normalized to the NNLO cross section 34) with 
MSTW2008 NNLO parton distribution functions f35jj . 
The y+heavy-flavor events are corrected by using the 
NLO to LO ratio obtained from the Monte Carlo pro- 
gram MCFM [32, 36] . We compare the data with the pre- 
diction for V^+jets production and find a relative data 
to MC normalization factor of 1.0 ± 0.1, obtained after 
subtracting all other expected background processes and 
before b tagging. 

This analysis begins with the selection of events with 
exactly one charged lepton, either an electron with trans- 
verse momentum pt > 15 GeV and pseudorapidity (37| 
\r)\ < 1.1 or 1.5 < < 2.5 or a muon with pt > 15 GeV 
and 1 77 1 < 2.0. Events are also required to have $t > 
15 (20) GeV for the electron (muon) channel and two or 
three jets with px > 20 GeV (after calibration of the jet 
energy (HI) and \rj\ < 2.5. $t is calculated from the 
energy deposits in the calorimeter cells and is corrected 
for the presence of muons [l8j | . 

Electron candidates are identified based on a multivari- 
ate discriminant that uses information from the central 
tracker, preshower detectors, and calorimeter. Muon can- 
didates are identified from the hits in the muon system 
that are matched to a central track and must be isolated 
from the energy deposits in the calorimeter. Inefficiencies 
introduced by lepton identification and isolation criteria 
are determined from Z — > M data and used to correct the 
efficiency in simulated events to match that in the data. 

Jets are reconstructed by using a midpoint cone algo- 
rithm [39] with a radius of Aft = ^J(Ay) 2 + (A^) 2 = 
0.5, where y is the rapidity. Differences in efficiency 
for jet identification and jet energy resolution between 
the data and simulation are applied as corrections to the 
MC O. 

Comparison of ALPGEN with other generators and 
with the data |4l( shows discrepancies in distributions 
of lepton and jet 77, dijet angular separations, and the 
Pt of W and Z bosons for F+jets events. The data 
are therefore used to correct the ALPGEN y+jets MC 



5 



events by weighting the simulated distributions of lepton 
r), leading and second-leading jet 77, A1Z between the two 
leading jets, and the W boson p T through the use of 
functions that bring the total simulated background into 
agreement with the data before b tagging, similar to the 
method employed in Ref. 

Multijet backgrounds are estimated from the data fljj . 
Before applying ^-tagging, we perform a fit to the distri- 
bution of the transverse mass Q of the W boson can- 
didate (Mr^) to determine the normalization of the MJ 
and y+jets backgrounds simultaneously. To suppress 
MJ background, events with Afjf < (40 - 0.5 x are 
removed in both the electron and muon channels. 

To further suppress the MJ background, we construct a 
multivariate discriminator that exploits kinematic differ- 
ences between the MJ background and signal. The mul- 
tivariate disciminator is a boosted decision tree (BDT) 
implemented in the tmva package 42j. The output dis- 
tribution in the data is well modeled by the total ex- 
pected simulated and MJ backgrounds and is used as 
one of the inputs to the final signal discriminant. 

The 6-tagging algorithm for identifying jets originating 
from b quarks is based on a combination of variables sen- 
sitive to the presence of tracks or secondary vertices dis- 
placed significantly from the pp interaction vertex. This 
algorithm provides improved performance over an earlier 
neural network algorithm [43J. The efficiency is deter- 
mined for taggable jets, which contain at least two tracks 
with each having at least one hit in the silicon microstrip 
tracker. The efficiency for jets to satisfy the taggability 
and 6-tagging requirements in the simulation is corrected 
to reproduce the data. 

Events must have at least one 6-tagged jet. If exactly 
one jet is 6-tagged, the 6-identification discriminant out- 
put of that jet must satisfy the tight selection threshold 
described below. Such events are classified as having one 
tight b tag. Events with two or more 6-tagged jets are as- 
signed to either the two loose b tags, two medium b tags, 
or two tight b tags category, depending on the value of the 
average 6-identification discriminant of the two jets with 
the highest discriminant values. The operating point for 
the loose (medium, tight) threshold has an identification 
efficiency of 79% (57%, 47%) for individual b jets, av- 
eraged over selected jet pr and rj distributions, with a 
6-tagging misidentification rate of 11% (0.6%, 0.15%) for 
light- quark jets, calculated by the method described in 
Ref. 

After applying these selection criteria, the expected 
event yields for the backgrounds and for a Higgs boson 
with mass Mh = 125 GeV are compared to the observed 
number of events in Table [I] Figure [TJa) shows the dis- 
tribution of the dijet invariant mass, using the two jets 
with the highest 6-identification output, for events with 
exactly two jets and all 6-tagged categories. The data are 
well described by the predicted background in all 6-tag 
categories. 



TABLE I: Summary of event yields for W+2 and W+3 jets 
final states. The number of events in the data is compared 
with the expected number of background events. Signal con- 
tributions (Mr = 125 GeV) are shown for WH and ZH pro- 
duction with H — > bb. All listed signal sources are considered 
when setting limits. Uncertainties include both statistical and 
systematic contributions, as described later in this Letter. 





Pre-6-tag 


One tight 6-tag 


Two 6-tags 


WH 


41.2 ±3.2 


12.5±1.2 


17.3±1.7 


ZH 


4.7 ±0.4 


1.4±0.1 


1.9 ±0.1 


vv 


6824 ± 678 


648 ± 55 


256 ± 18 


V+]i 


206 358 ± 18 624 


7149 ± 794 


2527 ±306 


V+hf 


34 068 ± 4447 


6486 ± 1510 


3164 ±739 


Top 


7222 ± 555 


2413 ±229 


2437 ± 238 


Multijet 


68 366 ± 6668 


4634 ± 473 


2020 ± 192 


All bkg. 


322 838 ± 24 756 


21 330 ±2190 


10 404 ± 1059 


Data 


322 836 


20 684 


10 071 



To separate the signal and background, we use final 
BDTs trained on the WH — > Ivbb signal samples and 
all the SM processes as background. We train an inde- 
pendent final BDT, using an individually optimized set 
of inputs, for each lepton flavor, jet multiplicity, 6-tag 
category, and Mh value considered, with Mh varying 
between 100 and 150 GeV in 5 GeV steps. When select- 
ing input variables, we ensure that each is well modeled 
and displays good separation between the signal and one 
or more backgrounds. Figures [TJb) and HJc) shows the 
final BDT output distributions for the two medium and 
two tight 6-tag channels in two-jet events with electron 
and muon channels combined. 

Uncertainties on the normalization and shape of the 
final BDT output distributions affect our sensitivity to a 
potential signal. Theoretical uncertainties include uncer- 
tainties on the ti and single top quark production cross 
sections (each having a 7% uncertainty 3^, 33 1 ) , an un- 
certainty on the diboson production cross section (6% 



31]), V~±light-flavor production (6%), and V~±heavy- 
flavor production (20%, estimated from mcfm [32I l36j|). 

Uncertainties from modeling that affect both the shape 
and normalization of the final BDT distributions include 
uncertainties on trigger efficiency as derived from the 
data (3%-5%), lepton identification and reconstruction 
efficiency (5%-6%), reweighting of ALPGEN MC samples 
(2%), and the MLM matching [23| applied to U±light- 
flavor events (« 0.5%). Uncertainties on the ALPGEN 
renormalization and factorization scales are evaluated by 
multiplying the nominal scale for each, simultaneously, 
by factors of 0.5 and 2.0 (2%), while uncertainties on 
the choice of parton distribution functions (2%) are esti- 
mated by using the prescriptions of Ref. [23, |44J . 

Experimental uncertainty that affects only the normal- 
ization of the expected signal and simulated backgrounds 
arises from the uncertainty on the integrated luminosity 
(6.1%) |45j. Those that affect the final BDT distribution 
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FIG. 1: (color online), (a) The dijet mass distribution for all 6-tag categories and two-jet exclusive events, (b) The final BDT 
output for two medium 6-tagged events and (c) two tight 6-tagged events. Electron and muon channels are combined. The 
Higgs boson signal is shown for Mh = 125 GeV. Signal events are scaled by a factor of 100 in (a) and 20 in (b) and (c). 



shapes include jet taggability (3% per jet), 6-tagging ef- 
ficiency (2.5%-3% per heavy-quark jet), the light-quark 
jet misidentification rate (10% per jet), jet identification 
efficiency (5%), and jet energy calibration and resolution 
(varying between 5% and 15%, depending on the process 
and channel). The MJ background model has a contri- 
bution from the statistical uncertainty of the data after 
tagging (10%-20%). 

To demonstrate measurement of processes with small 
cross sections in the same final state as WH, we train a 
discriminant with W Z and ZZ production as the signal, 
using the same event selection and input variables. We 
observe a 1.0 s.d. excess in the data over the background 
expectation, and our expected sensitivity is 1.8 s.d. If 
interpreted as a cross section measurement, the resul ting 
scale factor with respect to the predicted SM value [3_lJ, 
HI of 4.4 ± 0.3 pb is 0.55 ± 0.36 (stat.) ± 0.37 (syst.). 

In the search for the SM Higgs boson we observe no 
significant excess relative to the SM expectation and pro- 
ceed to set upper limits on the SM Higgs boson produc- 
tion cross section. We calculate all limits at the 95% 
C.L. using the modified frequentist CL S approach with a 
Poisson log-likelihood ratio of the signal -(-background hy- 
pothesis to the background-only hypotheses as the test 
statistic We treat systematic uncertainties as 

"nuisance parameters" constrained by their priors, and 
the best fits of these parameters are determined at each 
value of Mh by maximizing the likelihood with respect 
to the data. We remove the V+jets normalization ob- 
tained from the Mj! distribution and allow the com- 
ponents to vary by the aforementioned uncertainties of 
6% and 20% on V"+light-flavor and y+heavy-flavor pro- 
duction, respectively. Independent fits are performed 
to the background-only and signal-plus-background hy- 
potheses. All correlations are maintained among chan- 
nels and between the signal and background. Figure [5] 
shows the background-subtracted data along with the 
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FIG. 2: (color online). Distribution of the difference between 
the data and background expectation of the final BDT dis- 
criminant output for Mh = 125 GeV for the background-only 
model, shown with statistical uncertainties (points with error 
bars). The solid lines represent the ±1 s.d. systematic uncer- 
tainty after constraining with the data. The darker shaded 
region is the expected final BDT distribution for a SM Higgs 
signal for Mh = 125 GeV. Here we combine BDT discrimi- 
nant bins from each channel according to the bins' log 10 (s/6) 
values. 



best fit for the background-only model ±1 s.d. system- 
atic uncertainties and the expected signal contribution 
for all channels combined, where we combine bins from 
each channel according to their log 10 (s/6) value in or- 
der to group bins with similar sensitivity. The log- 
likelihood ratios for the background-only model and the 
signal-plus-background model as a function of Mh are 
shown in Fig. [3] The upper limit on the cross section 
for a(pp -> H + X) x B(H -> bb) for M H = 125 GeV is 
a factor of 5.2 larger than the SM expectation and our 
expected sensitivity is 4.7. The corresponding observed 
and expected limits relative to the SM expectation are 
given in Table |TTJ 
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FIG. 3: (color online). Log- likelihood ratio for the 
background-only model (LLRs, with 1 and 2 s.d. uncer- 
tainty bands), signal+background model (LLRs+s) and data 
(LLR bs) versus Mh- 



TABLE II: The ratio of the observed, Robs, and expected, 
Rexpt, 95% upper limit to the SM Higgs boson production 
cross section. 



Mh (GeV) WO 

~R~expt 2.2 

Robs 2.8 



105 
2.6 



110 115 
2.9 3.2 
2.9 3.7 



120 125 
3.8 4.7 
5.0 5.2 



130 
6.8 



135 
8.9 



140 145 150 
11.7 17.5 25.6 
15.1 18.8 21.8 



In conclusion, we have performed a search for SM Higgs 
boson production in £+$T+jets final states using two or 
three jets and 6-tagging with the full run II data set of 9.7 
fb _1 of integrated luminosity from the DO detector. The 
results are in agreement with the expected event yield, 
and we set upper limits on a(pp — > H + X) x B(H — > bb) 
relative to the SM Higgs boson cross section ctsm for Mh 
between 100 and 150 GeV, as summarized in Table [TTJ 
For Mh = 125 GeV, the observed limit normalized to 
the SM prediction is 5.2 and the expected limit is 4.7. 
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